Consolidate state.db tables into global.db (single user-global database)#1127
Conversation
…pe, strict marker)
…kspace-scoped - GLOBAL_SCHEMA absorbs architect/utils/annotations as-is + builders reshaped (workspace_path + composite PK); global migration v14 creates them on existing DBs - getDb()/getDbPath() alias global; remove ensureLocalDatabase + local v1-v12 ladder and the dead state.json migration (db/migrate.ts) - state.ts builder fns workspace-scoped (derive workspace_path from worktree on upsert; optional workspacePath on reads); lookupBuilderSpawningArchitect drops the per-file open - builder-read callsites scoped: builder-lookup, cleanup, tower-routes, overview
…kspace-scoped - GLOBAL_SCHEMA absorbs architect/utils/annotations as-is + builders reshaped (workspace_path + composite PK); global migration v14 creates them on existing DBs - getDb()/getDbPath() alias global; remove ensureLocalDatabase + local v1-v12 ladder - state.ts builder fns workspace-scoped (derive workspace_path from worktree on upsert; optional workspacePath on reads); lookupBuilderSpawningArchitect drops the per-file open - builder-read callsites scoped: builder-lookup, cleanup, tower-routes, overview
…+ dry-run preview - db/consolidate.ts: reusable upsert-if-newer engine (planMigration/applyMigration), defensive legacy reads, _consolidation marker, strict boot one-off (runBootConsolidation) - tower-server.ts: run the boot one-off before initInstances() - afx db consolidate <path> (dry-run default, --apply); afx tower start --dry-run-migration
…overview) - state.test: unified getDb/getGlobalDb mock on GLOBAL_SCHEMA; workspace-scoped builders - spec-755-lookup-builder: rewritten for one shared global.db + workspace_path scoping, incl. same-id-different-workspace isolation - overview.test: builders enrichment reads global.db (mocked getGlobalDbPath per test) - delete migrate.test.ts (tested removed db/migrate.js / state.json migration)
…IndexOf) deriveWorkspaceFromWorktree must use the LAST /.builders/ segment, not the first — robust when a worktree path is itself nested under another .builders/ dir. Tests cover clean copy, pre-v11 shape + workspace_path synthesis, same-id cross-workspace isolation, upsert-if-newer satellite conflict, pure-preview planMigration, marker + rename idempotency.
…ference) state.ts builder reads + deriveWorkspaceFromWorktree + lookupBuilderSpawningArchitect; consolidate.ts helpers/normalizers/classify; tower.ts preview db-open; db.ts consolidate header.
3 copies of the realpath-or-resolve normalizer (tower-utils normalizeWorkspacePath + inline canonicalize in state.ts and consolidate.ts) collapse into utils/workspace-path.ts — a leaf module (node builtins only) importable by both the data and server layers without a cycle. tower-utils re-exports it so its 6 importers are unchanged; state.ts and consolidate.ts import it aliased as canonicalize.
… db consolidate - clearRuntime(workspacePath): scope the builders DELETE by workspace_path — an unscoped delete on the shared global.db wiped every workspace's builders on 'afx workspace stop' (real regression). Thread workspacePath through stop.ts. utils/annotations (global, vestigial) left untouched to avoid cross-workspace wipe. - afx db consolidate: friendly no-op on a missing source (re-run after rename) and skip already-archived *.pre-merge-* files, instead of fatal / double-rename. - Regression tests: clearRuntime cross-workspace isolation; db consolidate repeat-run.
Architect Integration ReviewVerified both codex-flagged fixes and the underlying consolidation independently against PR head. High-quality PR for a high-impact structural change. Codex fix #1:
|
…er2) - send.ts detectCurrentBuilderId opened the RETIRED per-workspace state.db to resolve a builder's canonical id for 'afx send' (issue #1094 anti-spoofing). After migration that file is renamed, so afx send from a worktree would break. Repoint to global.db scoped by workspace_path (matches the lookupBuilderSpawningArchitect/overview.ts fixes). This was the LAST direct state.db open; audit-found, missed by all 3 consult models. - afx db consolidate dry-run called getGlobalDb() which eagerly creates/migrates global.db, violating the 'no writes' contract (codex iter2). Dry-run now opens global.db read-only (or in-memory if absent); only --apply uses the RW connection. - Tests: rewrite bugfix-774 for the global.db model incl. same-id cross-workspace scoping; new db-consolidate-command.test (dry-run side-effect-free + idempotency guards).
Architect Re-Review (post-audit + iter2)Verified both additional fixes from commit Audit-found (HIGH):
|
…roke CI) The file's deletion was only in the working tree (a stray git stash pop during dev unstaged it), so it lingered in HEAD importing the already-deleted db/migrate.js — passing locally but failing CI with ERR_MODULE_NOT_FOUND.
…onsolidation tests - send.ts detectWorkspaceRoot/detectCurrentBuilderId used lazy .+? (first-match) regexes → a nested worktree <repo>/.builders/a/.builders/b resolved the OUTER builder. Greedy .+ (last /.builders/), matching deriveWorkspaceFromWorktree's lastIndexOf. Nesting is an unsupported anti-pattern, but the parse is now consistent (codex iter3). + regression test. - Refreshed the stale detectCurrentBuilderId docstring (state.db/singleton → global.db) (claude iter3). - Added direct runBootConsolidation tests: first-boot migrate+marker+rename, marker-set no-op, strict mark-done-when-absent (codex iter3 coverage gap).
Architect CMAP — 3-way integration reviewThank you for this — it's an exceptionally thorough PIR. The root-cause writeup in #1118 is one of the clearest I've read, and the fact that the review iterations caught real, severe bugs (the I ran an independent 3-way integration review. All three models: APPROVE, HIGH confidence, no merge blockers.
What the panel affirmed
Low-severity follow-ups (none blocking)All safe to defer — noting them so they don't get lost:
One pre-merge step on my sideBecause this is the highest-blast-radius change we ship — it auto-migrates real Really nice work. 🙏 |
waleedkadous
left a comment
There was a problem hiding this comment.
Approving on the strength of the 3-way CMAP above — Gemini, Codex, and Claude all APPROVE at HIGH confidence with no blockers. Thank you again for how thorough this was; the review iterations catching the clearRuntime cross-workspace wipe is exactly what the process is for.
Note: merge is still pending one backed-up local-install smoke test on my side, since the migration touches live ~/.agent-farm state on first Tower boot — this approval just clears the review-required gate.
…(no session_id column) Pins the exact field shape flagged in review: architect has workspace_path (v11/#826) but not session_id (v12/#832 unreleased). Since #1118 deleted the migration ladder, consolidation is the sole reader and never runs v12 — SELECT * + defensive '?? null' must tolerate the absent column. This test would fail if the read ever became an explicit SELECT session_id.
…2 / session_id)
PIR Review: Consolidate state.db tables into global.db
Fixes #1118
Summary
Retired the per-workspace
state.dbfile and moved its four tables (architect,builders,utils,annotations) into the already user-global~/.agent-farm/global.db.getDb()nowreturns the single global connection, so architect/builder state no longer depends on Tower's
start-cwd — the root cause of "some architects missing their state after a restart."
builderswas reshaped to be workspace-scoped (composite
(workspace_path, id)PK), and a one-time,marker-gated boot migration (plus a manual
afx db consolidate <path>and anafx tower start --dry-run-migrationpreview) moves legacystate.dbfiles in — renamingsources to
*.pre-merge-<ts>, never deleting.Files Changed
(code, vs
mainmerge-base)packages/codev/src/agent-farm/db/schema.ts(+92/−…) —GLOBAL_SCHEMAabsorbs the four tables;buildersreshaped withworkspace_path+ composite PK;LOCAL_SCHEMAretained as legacy-shape referencepackages/codev/src/agent-farm/db/index.ts(−≈470 net) —getDb()/getDbPath()alias the global connection; removedensureLocalDatabase()+ the local v1–v12 migration ladder; added global migration v14packages/codev/src/agent-farm/db/consolidate.ts(+456, new) — reusable upsert-if-newer engine (planMigration/applyMigration), defensive legacy reads,_consolidationmarker, strict boot one-off (runBootConsolidation)packages/codev/src/agent-farm/db/migrate.ts(−135, deleted) — dead state.json→SQLite migrationpackages/codev/src/agent-farm/db/types.ts(+1) —DbBuilder.workspace_pathpackages/codev/src/agent-farm/state.ts(+…/−…) — builder functions workspace-scoped (deriveworkspace_pathfromworktreeon upsert; optionalworkspacePathon reads);lookupBuilderSpawningArchitectdrops the per-file openpackages/codev/src/agent-farm/utils/workspace-path.ts(+26, new) — single-sourcenormalizeWorkspacePath(dedupe of 3 copies)packages/codev/src/agent-farm/servers/tower-server.ts(+16) — boot one-off callerpackages/codev/src/agent-farm/servers/tower-utils.ts(±16) — re-exports the shared normalizerpackages/codev/src/agent-farm/servers/overview.ts(+14/−…),servers/tower-routes.ts(+4/−…),lib/builder-lookup.ts(+12/−…),commands/cleanup.ts(+5/−…) — builder-read callsites scoped by workspacepackages/codev/src/agent-farm/commands/db.ts(+48),commands/tower.ts(+54),cli.ts(+17) —afx db consolidate <path>+afx tower start --dry-run-migration__tests__/consolidate.test.ts(+246, new);state.test.ts,spec-755-lookup-builder.test.ts,overview.test.tsreworked for the single-DB model;migrate.test.ts(−264, deleted)codev/resources/arch-critical.md,arch.md,lessons-learned.mdupdatedCommits
c05a2e0d…1f2e8fa9— Plan phase (draft + gate-decision revisions)539dc93b/de357092Move state.db tables into global.db; reshape builders workspace-scopedd44fc707Consolidation engine + boot one-off +afx db consolidate+ dry-run preview3a2ee987Update tests for single-DB model (state, lookup-builder, overview)d32c0391Consolidate engine tests + fix workspace derivation (lastIndexOf)fc6f371aClean up stale state.db references in state.ts commentsf8823e53Replace introduced ternaries with if/else (no-ternary preference)65ed4788Dedupe workspace-path canonicalization into one leaf moduleTest Results
pnpm build: ✓ passpnpm test: ✓ pass — full suite 3419 passed, 48 skipped, 0 failures (agent-farm alone: 2014 passed); consolidate suite adds 8 new casesafx db consolidatedry-run + apply on a copy of the real 40-rowcodevstate.dbmigrated 40 architect rows across 38 workspaces into a throwawayglobal.db, source renamed to*.pre-merge-<ts>, re-apply a graceful no-op. Verifiedafx tower start --dry-run-migrationpreviews without spawning/mutating, and the v13→v14 migration path on a snapshot of the realglobal.db(existingterminal_sessions/known_workspacespreserved).Architecture Updates
Routed HOT (
arch-critical.md): rewrote the always-injected state fact — "State lives in.agent-farm/state.db+~/.agent-farm/global.db" → "single user-global~/.agent-farm/global.db(Issue #1118 retired per-workspace state.db; architect/builders keyed by
workspace_path)". Thisis behavior-changing and cross-cutting, so it belongs in the capped hot tier (no displacement
needed — reworded an existing entry in place).
Routed COLD (
arch.md): updated the "State inconsistency" quick-map row, the "StateConsistency" invariant, and the state-persistence section (two-databases → one
global.db;described the composite-PK reshape, the
consolidate.tsengine, and migration v14). Left thehistorical #826 v11-migration detail intact (still accurate about schema evolution).
Lessons Learned Updates
Routed COLD (
lessons-learned.md→ Architecture): a[From #1118]entry — "when asubsystem's scope outgrows its storage location, relocate the storage rather than patching the
scope with a column," plus the corollary that moving a table into a shared DB forces re-examining
its primary key (the
buildersid-collision finding), and the reusable migration techniques(upsert-if-newer, defensive
PRAGMAreads). Not routed HOT: the existing hot lesson "singlesource of truth beats distributed state — consolidate duplicates rather than syncing them"
already captures the headline; this PR is a detailed instance of it, so it belongs in cold.
Things to Look At During PR Review
buildersreshape (db/schema.ts, migration v14,state.ts) — the one structuralchange. Confirm every builder read that should be workspace-scoped is (
builder-lookup.ts,cleanup.ts,tower-routes.ts,overview.ts,loadState), and that the security-relevantlookupBuilderSpawningArchitectfilters by(workspace_path, id).deriveWorkspaceFromWorktreeuseslastIndexOf('/.builders/'), notindexOf— a bug thetests caught (a worktree path can itself sit under another
.builders/, e.g. a builder testingfrom its own worktree). Same helper duplicated in
state.tsandconsolidate.ts.runBootConsolidation) — marks done unconditionally onfirst boot even if the active
state.dbis absent/empty; satellites are recovered viaafx db consolidate. Confirm the marker + row-copy share one transaction.consolidate.tsnormalizers) — pre-v11 architect (integer id, noworkspace_path) and legacy builder rows are normalized without running the old migrationladder; enum values are clamped to satisfy the new CHECK constraints.
local migration ladder (v1–v12), consolidation is the sole reader of legacy
state.dbfilesand never brings their schema up to date first — so it must tolerate every historical
shape. This is why
readRowsusesSELECT *(neverSELECT <column>) and everymigration-added column is read as
row.<col> ?? null(or synthesized). Concretely, a verycommon field shape is v11-but-not-v12: the
architecttable hasworkspace_path(CRITICAL: Sibling architects leak across workspaces — launchInstance reconcile reads global state.db.architect without workspace filtering #826)but no
session_idcolumn (Multi-architect conversation resume: disambiguate via per-architect session UUID #832 not yet rolled out). An explicitSELECT session_idwouldthrow
no such column;SELECT *+?? nullyieldssession_id = nulland migrates cleanly.Same for the builder columns added by v8/v9/v10 (
issue_number,spawned_by_architect,typeincl.'pir'). Pinned by tests: thepre-v11case and an explicit v11-but-not-v12(no
session_id) regression test that would fail if any read ever became column-specific.db/index.tsdeletion is large (−≈470) — it's the removed local-migration ladder, foldedinto
GLOBAL_SCHEMA+ v14. Worth a scan that nothing live still referenced it.How to Test Locally
pir-1118→ View Diffcodev/plans/1118-*.md):pnpm --filter @cluesmith/codev test(state / consolidate / lookup-builder suites)afx tower start --dry-run-migration(orafx db consolidate <copy-of-state.db>) — reads, no writesNODE_ENV=test AF_TEST_DB=global-probe.dbagainst a copy of a realstate.db/global.db— exercises migration v14 + data migrate without touching live statethen stop-from-A / start-from-B and confirm A's architects are visible (the fragmentation fix)
Consultation (3-way verify, single advisory pass)
d9828577):clearRuntime()wiped all workspaces' builders — it did an unscopedDELETE FROM builders, andafx workspace stopcalls it; on the sharedglobal.dbthat deleted every workspace's builders, not just the stopping one. Fixed:clearRuntime(workspacePath)scopes the delete byworkspace_path(threaded throughstop.ts);utils/annotations(global, vestigial) are left untouched to avoid a cross-workspace wipe. Test:clearRuntime(A)leaves B's builders intact.afx db consolidaterepeat-run wasn't idempotent — re-running on the renamed sourcefataled, and an already-*.pre-merge-*archive would re-migrate + double-rename. Fixed: missing source → friendly no-op; archived file → skip. Tests added.re-run (iteration 2) added gemini (APPROVE).
Iteration 2 (re-run after the iter-1 fixes) + a manual cross-workspace audit surfaced two
more consolidation-fallout bugs, both fixed (commit
7f6ce330):3.
send.ts detectCurrentBuilderIdopened the RETIRED per-workspacestate.dbto resolvea builder's canonical id for
afx send(issue #1094 anti-spoofing). After migration renamesthat file,
afx sendfrom a worktree would break. This was the last directstate.dbopen (its siblings
lookupBuilderSpawningArchitectandoverview.tswere fixed in the mainimplementation; this one was missed). Found by the audit — all three consult models missed
it. Fixed: read
global.dbscoped byworkspace_path. Test rewritten incl. a same-idcross-workspace scoping case.
4.
afx db consolidatedry-run wasn't side-effect-free (codex iter2) — it calledgetGlobalDb(), which eagerly creates/migratesglobal.db. Fixed: dry-run opensglobal.dbread-only (or in-memory if absent); only
--applyuses the RW connection. Test added.loadState()returnsutils/annotationsunscoped (they have noworkspace_pathcolumn). These tables are vestigial (no production producers — verified), sothe exposure is a stale/empty read, not a live leak. Left as-is; noted for a future cleanup.
Iteration 3 (final re-run: claude APPROVE, gemini COMMENT, codex REQUEST_CHANGES) — two
items, both addressed (commit
c81b8f0d):5.
send.ts.buildersregexes used lazy.+?(first-match) — a nested worktree(
<repo>/.builders/a/.builders/b) would resolve the outer builder. Same class as theindexOf→lastIndexOffix; switched to greedy.+(last/.builders/) + regression test.Nesting is an unsupported anti-pattern (
afx spawnfrom inside a worktree), so this is aconsistency fix, not a normal-path bug. (Stale
detectCurrentBuilderIddocstring refreshed.)6. No direct
runBootConsolidationcoverage — added tests for the real boot path:first-boot migrate+marker+rename, marker-set no-op, and strict mark-done-when-absent.
un-notarized
@openai/codexvendor binary as malware and auto-deleted it (SIGKILL→ENOENT);restoring the binary + ad-hoc
codesignunblocked it. Upstream/packaging follow-up, not asignal about this PR.
Notes / Out of Scope
terminal shellper-integration tests fail on a stale build (they need
pnpm build'scopy-skeleton step / a live shellper) — verified they fail identically with this change stashed,
so they are unrelated. After a full
pnpm buildthe entire suite is green.session_idfor builders deliberately excluded — that's Unify builder conversation resume onto persisted session IDs (harness.session), retiring the jsonl-discovery heuristic #1112's charter (builders resumevia unique-worktree mtime discovery; no shared-cwd sibling to disambiguate). Layers on cleanly
as a later additive column.
afx prune-state/afx workspace forget(named in the issue's acceptance criteria) werecut at the plan gate — stale rows are harmless under workspace-scoped reads; cleanup is a
separable follow-up if ever needed.
normalizeWorkspacePathdedupe collapsed 3 real copies intoutils/workspace-path.ts; thebare
fs.realpathSynccalls intower-instances.ts/tower-routes.tsare a different pattern(no fallback) and were intentionally left alone.